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ABSTRACT 

Formatted  input-output  is  available  in  a  number  of  programming 
languages.   In  the  most  general  case,  the  corresnondence  between  data 
items  and  format  items  cannot  be  determined  during  comoilation,  and  so 
it  is  determined  dynamically  during  execution.   However,  in  most  oairs 
of  data  and  format  lists  that  occur  in  practice,  determination  of  the 
correspondence  is  in  fact  possible  during  compilation.   Although  some 
commercial  compilers  make  this  determination,  there  is  little  published 
literature  on  the  subject.   In  this  paper,  we  briefly  examine  three 
areas  in  which  compile-time  determination  of  the  data-format  correspondence 
is  useful:   optimiaation ,  program  validation,  and  automatic  test  data 
generation.   A  formalism  for  stating  the  problem  is  given,  and  a  solution 
is  discussed  in  terms  of  formal  language  theorv.   Using  this  formalism, 
an  algorithm  for  determining  the  correspondence  is  given,  and  its  appli- 
cation is  illustrated  by  examples  in  both  PL/I  and  FORTRAN. 


Keywords  and  key  phrases:   formats,  compilers,  program  optimization, 
program  validation,  test  data  generation,  input-output,  static  program 
analysis . 


1.   Introduction 

Formatted  input-output  pliiys  an  important  role  in  FORTRAN  and  PL/I , 
and  is  also  provided  in  Algol  68  and  Ln  certain  Algol  extensions.   A 
formatted  input-output  operation  is  specified  by  providing  a  data  list 
and  a  format  list.   The  data  list  specifies  the  items  to  be  read  or  written, 
while  the  format  list  specifies  how  the  items  are  represented  on  the 
input  or  output  medium.   In  most  cases  that  occur  in  practice,  it  is 
possible  to  determine  during  compilation  how  data  items  are  paired  with 
format  items.   That  isn't  too  surprising,  since  the  programmer  should  have 
anticipated  the  correspondence  when  the  data  list  and  format  list  were 
composed.   Consolidating  the  two  lists  into  one  list  of  pairs  eliminates 
the  need  for  expensive  execution-time  linkage  mechanisms,  and  moreover 
makes  it  possible  to  derive  information  useful  in  program  validation  and 
in  automatic  test  data  generation.   In  this  paper,  we  present  an  algorithm 
for  converting  the  two  lists  into  a  single  list  of  pairs.   Although  the 
conversion  is  trivial  if  the  lists  are  expanded  by  writing  out  al]. 
iterations  in  full,  it  is  not  trivial  if  we  desire  to  retain  as  much  of 
the  original  iteration  structure  as  possible,  which  our  algorithm  does. 

In  some  cases,  our  algorithm  rejects  the  input  because  the  correspon- 
dence cannot  be  determined  until  execution.   For  example,  consider  the 
FORTRAN  statements: 

WRITE  (1,5)   (A(I),I  =  1,M),B 

5     FORMAT  (E14. 3,E12.3) 
We  cannot  know  prior  to  execution  whether  B  will  correspond  to  the  format 
E14.3  or  to  the  format  E12.3,  since  that  depends  on  whether  M  is  even 
or  odd. 


In  this  paper,  we  shall  consider  the  application  of  our  al'7Gj.ithm 
to  Fortran  and  PL/I;  we  have  not  attempted  to  apply  it  to  other  languages. 
The  current  status  of  our  work  is  that  the  algorithm  has  been  programmed 
(in  SNOBOL)  and  tested,  but  it  has  not  been  incorporated  into  an  actual 
compiler.   Although  the  SNOBOL  implementation  is  effective  for  testing 
and  experimentation,  a  practical  implementation  would  necessarily  use 
lists  rather  than  character  strings  as  its  underlying  representation. 
2 .   Applications 

The  major  application  of  our  cdgorithm  is  the  optimization  of 
formatted  input-output.   Ordinarily  the  execution  of  a  formatted  inout- 
output  statement  is  implemented  by  a  pair  of  coroutines,  one  for  the 
data  list  and  one  for  the  format  list.   Each  coroutine  keeps  track  of 
the  position  in  the  list,  and  finds  the  next  item  when  it  is  called. 
Control  shuttles  back  and  forth  between  the  two  routines,  and  when  a 
data- format  pair  is  obtained,  the  appropriate  input  or  output  action  is 
taken.   The  code  required  for  this  conversation  can  be  eliminated  if  the 
correspondence  is  known  in  advance,  since  then  the  proper  format  can  be 
compiled  directly  into  the  data  list.   Knuth ' s  study  of  FORTRAN  programs 
[10]  found  that  about  25%  of  the  overall  execution  time  was  spent  in  the 
I/O  editing  routines.   Therefore,  we  expect  compile-time  analysis  to 
produce  a  noticeable  reduction  in  execution  time. 

The  correspondence  between  the  data  list  and  format  list  can  be  used 
i.n  program  validation  to  detect  certain  types  of  programming  errors.   For 
instance,  we  can  check  whether  the  type  of  each  data  item  agrees  with 
the  type  of  the  corresponding  format  item.   When  making  this  check,  we 


would  want  to  ignore  certain  distinctions  among  format  items.   For 
instance,  in  the  example  given  earlier,  the  formats  E14 . 3  and  E12.3 
would  be  treated  as  one  and  the  same  since  they  both  match  variables  of 
type  REAL.   Since  formats  are  a  major  source  of  errors  for  beginning 
FORTRAN  programmers,  this  check  would  be  valuable  in  diagnostic  FORTRAN 
compilers.   In  PL/I,  however,  all  printable  data  types  convert  to  all 
other  printable  data  types,  so  formats  are  always  correct  from  this 
viewpoint. 

Once  the  correspondence  between  a  data  item  and  a  format  item  is 
known,  then  a  range  of  permissible  values  for  the  data  item  is  also  known. 
This  information  can  be  useful  for  more  sophisticated  validations.   For 
example,  suppose  w(i  have  the  FORTRAN  sequence: 
WRITE  (5,10)  N 

10    FORMAT  (12) 
A  warning  should  be  issued  if  N  is  potentially  greater  than  99  or  less 
than  -9.   Recent  work  in  static  program  analysis  ri,^»5]  should  be 
useful  in  this  type  of  validation. 

In  automatic  test  data  generation,  a  topic  investigated  by  Clarke 
in  [3] ,  an  attempt  is  made  to  determine  legal  input  data  that  will  exercise 
particular  program  paths.   Knowing  the  format  specifications  of  a  data 
item  determines  a  range  of  potential  values  for  the  data  item.   This  in 
turn  may  limit  the  range  of  other  variables.   For  example,  in  the  following 
FORTRAN  sequence: 

READ  (5,10)  I, J 

10    FORMAT (12, 110) 
K  =  I  +  J 


4 

wo  note  that  the  tjossiblo  range  of  legal  values  for  I  is  -9  to  °0  and 
for  J  is  -99  to  999,  while  the  nossiblo  range  of  values  for  K  is  -lOR 
to  1098. 

Current  test  data  generation  systems  have  ignored  format  information, 
even  though  FORTRAN  and  PL/I  have  been  the  languages  most  freauentlv 
analyzed  tay  such  systems  [3,7,9,12,13].   Of  course,  it  is  possible  to 
compare  tlie  generated  data  with  the  corresoonding  format  statements, 
using  the  coroutines  approach  mentioned  in  connection  with  optimization. 
However,  if  the  generated  data  is  not  consistent  with  the  corresponding 
format,  an  expensive  reanalysis  is  necessary.   It  would  be  more  economical 
to  extract  the  data- format  correspondence  before  analysis. 

The  optimization  aspect  of  formatted  incut-output  is  touched  upon  by 
Lee  in  his  FORTRAN-oriented  book  on  compiler  writing  [11] .   Moreover,  the 
IBM  PL/I  Optimizing  Compiler  [8]  does  match  data  lists  with  format  lists. 
However,  the  circumstances  under  which  this  matching  is  done,  and  the 
method  used  to  accomplish  it,  are  proprietary.*   Although  other  commercial 
compilers  also  perform  this  matching,  we  are  not  aware  of  any  published 
literature  about  them.   Torsun  and  Robinson  [14]  have  developed  a  system 
that  jireprocesses  formats,  but  their  system  does  not  perform  any  compile- 
time  analysis  on  data  lists  that  contain  iterations.   Their  discussion 
deals  mostly  with  the  numerical  encoding  of  formats,  and  has  little  to 
say  about  the  problems  considered  here. 
3 .   Notation 

From  now  on,  we  will  refer  to  format  items  as  F-items  and  to  data 
items  as  D-items.   Similarly,  a  format  list  will  be  referred  to  as  an 


*We  wish  to  emphasize  that  the  methods  developed  in  this  oaper  were  devised 
without  any  knowledge  of  the  IBM  method,  as  neither  of  us  has  access  to  it. 


F-list,  and  a  data  list  as  a  D-list.   We  distinguish  three  kinds  of 

repetition  factors:   constant,  variable,  and  infinite.   Constant  repetition 

factors  are  written  explicitly.   Variable  repetition  factors  are  denoted 

by  V,  ,  V  ,  ...    V     and  infinite  repetition  factors  by  °°.   Esse  itially  the 
12s 

same  formation  rules,  but  with  different  individual  items,  can  be  used 
for  D-lists  and  F-lists: 

(1)  An  individual  item  is  a  component. 

(2)  If  X  ,x  , . . . ,x   are  components,  then  [x  ,x  ,...,x  ]  is  a  non- 

J.    ^         K  J.    ^         K 

repeated  sequence  with  subcomponents  x  , . . . jX,  • 

(3)  If  X  ,x  ,...,x  are  components  and  r  is  a  constant,  variable  or 
infinite  repetition  factor,  thenr [x  ,x  , . . . ,x  ]  is  a  repeated 
sequence  with  subcomponents  x  ,x  , . . . ,x  .   A  repeated  sequence 
is  a  component. 

(4)  If  [x  ,x  , . . . ,x  ]  is  a  nonrepeated  sequence  whose  individual 

L       2.  K 

items  are  all  D-items,  then  [x  .x^,...,x  1  is  a  D-list.   If 

12      k 

[x  ,x  , . . . ,x  ]  is  a  nonrepeated  sequence  whose  individual  items 

J.    ^         JC 

are  all  F-items,  then  [x  ,x  , .  .  .  ,«>[x  ]  ]  is  an  F-list. 

J.        2.  K 

These  rules  require  that  sequences  always  appear  with  repetition  factors 
except  at  the  outermost  level.   Thus,  an  individual  item  with  a  repetition 
factor  must  be  replaced  by  a  unit  list  with  that  repetition  fac':or  (e.g.,  we 
replace  4F  by  4[F  ]).   The  asymmetry  in  rule  (4)  is  accounted  'or  by  the 
facts  that  infinite  repetition  cannot  occur  in  D-lists  (except  )>y   error 
in  certain  PL/I  situations)  and  that  any  F-item  following  an  inl inite 
repetition  can  just  as  well  be  ignored.   For  FORTRAN  and  PL/I,  1  here  are 
further  restrictions  on  F-lists.   Ln  FORTRAN,  no  variable  repetition  factor 


can  occur  in  the  F-list  and  only  the  rightmost  level-one  parenthesized 
list  has  an  implied  infinite  repetition  factor.   In  PL/I,  infinite  repeti- 
tion can  be  applied  only  to  the  entire  F-list,  so  that  k  must  be  1  in 
rule  (4) . 

Some  examples  are  in  order  to  show  how  the  notation  corresponds  to 
reality.   Consider  the  FORTRAN  example: 

WRITE  (5,100)  ((A(I,J),  J  =  1,7),  I,  I  =  1,M) 

100   FORMAT  (7E10.1,  13,  (7E12.2,I2)) 

The  D-list  and  F-list  are  then: 


[V^[7[D^],D2n 


and 


[7[F^l,F2,oo[7[F^],F^]] 
respectively,  with 

D^  =  A(I,J) ,  D^  =  I,  F^  =  ElO.l,  F^  =  13,  F^  =  E12.2,  F^  =  12,  V^  =  M. 
We  have  ciiosen  to  treat  A(I,J)  as  a  single  item,  although  for  certain 
applications  of  test  data  generation,  a  finer  distinction  may  be  desirable. 
A  similar  example  in  PL/I  is: 

PUT  1;DIT  (((A{I,J)  do  J  =  1  to  7)  ,  I  DO  I  =  1  TO  M)  ) 

((7)E{10,1),  F(3),  (M-1)  ((7)E(12.2),  F(2))) 
The  D-list  is  represented  as  in  the  FORTRAN  example,  but  the  F-list  is 

[-[7[F^] ,F^,y^ll[F^],F^]]] 
where  V_  =  (M-1)  and  the  other  symbols  are  the  same  as  before. 

4 .   The  Correctness  Problem  in  Terms  of  Formal  Language  Theory 

Provided  that  the  F-list  contains  no  variable  reoetition  factors,  the; 
correctness  problem  can  be  shown  to  be  solvable  using  results  from  formal 


language  theory.   If  variable  repetition  factors  are  present  in  the  F-list, 
and  nothing  is  known  about  them,  then  formal  language  theory  is  of  no 
help.   For  consider: 

D-list:   [V^[D^],D^] 

F-list:   [V^[F^],F^] 
Assume  moreover  that  F   and  F   are  valid  formats  for  D   and  D 
respectively.   Even  though  the  two  lists  have  the  same  form,  we  cannot  tell 
whether  they  match  correctly. 

If  there  are  no  variable  repetition  factors  in  the  F-list,  then  we 
can  transform  both  lists  into  regular  expressions  (see,  for  instance, 
Hopcroft  and  Ullman  [6])  as  follows: 

(1)  If  D,  ,D.  ,...,D.   are  the  D-items  that  match  F.,  then  replace  F. 
by  the  expression 

(D.   V  D.   V.  .  .V  D.  ) 

(2)  Replace  each  variable  or  infinite  repetition  factor  by  *,  indicating 
zero  or  more  occurrences . 

(3)  Expand  out  each  constant  repetition  factor. 

We  then  have  two  regular  languages,  D  and  F^  respectively,  for  the  D-list 
and  F-list.   We  then  see: 

(1)  If  D  c  F^,  then  the  correspondence  is  valid. 

(2)  If  D  n  F^  is  empty,  then  the  corjespondence  cannot  bo  valid. 

(3)  In  all  other  cases,  the  validity  of  the  correspondence  cannot  be 
determined. 

These  statements  follow  from  the  observation  that  the  sentences  in  2.  ^^^ 
all  the  possible  sequences  of  D-items,  while  the  sentences  in  F  are 
obtained  by  taking  all  the  possible  sequences  of  F-items  (a  necessacily 


infinite  set)  and  replacing  each  F-item  bv  all  possible  D-items  tnat  it 
can  match.   Now  the  relation  between  D^  and  F^  can  be  alqorithmically 
determined  since  the  containment  and  intersection  problems  for  regular 
languages  are  solvable  (again,  see  Hopcroft  and  Ullman) .   It  follows  that 
the  correctness  oroblem  is  indeed  solvable.   Since  FORTRAN  has  no  variable 
repetj  tion  factors  in  its  formats ,  the  correctness  nroblem  can  be  solved 
for  that  language,  in  the  sense  that  we  can  determine  which  of  the  three 
cases  given  above  is  applicable.   For  PL/I,  it  cannot  be  solved  except  for 
formats  having  constant  repetition  factors. 

Although  formal  language  theory  shows  that  the  correctness  problem 
is  solvable,  and  even  provides  an  algorithm  for  solution,  that  algorithm 
is  not  a  practic:al  one.   The  formal  solution  requires  that  all  constant 
repetitions  be  fullv  expanded,  and  moreover  requires  that  we  construct  the 
product  of  two  finite-state  machines  and  then  test  the  language  defined 
by  the  product  for  emptiness.   For  a  practical  algorithm,  we  use  the  same 
methods  as  we  use  for  the  other  applications,  and  actually  find  the 
correspond(?nce  between  data  items  and  format  items. 

5.   Method  of  Solution 

A  solution  to  the  correspondence  problem  can  be  expressed  by  replacing 
each  D   in  a  data  list  by  a  pair  <D.,F.>,  where  F.  is  the  format  that 

matches  D  .   First,  we  define  the  inner  cardinalitv  of  a  reneated  sequence 

1 

to  be  the  number  of  individual  items  in  the  immediatelv  contained  nonrepeated 
sequence,  with  repetitions  counted.   For  instance,  the  inner  cardinality  of 
3[2[D  ],5[D  ]]  is  7.   The  inner  cardinality  is  variable  if  the  sequence 


9 

contains  any  variable  repetition  factors.   The  inner  cardinality  can 

be  computed  in  an  obvious  way  by  analyzinq  nested  repeated  seauences  from 

the  inside  out. 

We  present  the  algorithm  as  a  sequence  of  operations,  using  a  semiformal 
style  of  English  adopted  from  the  recent  PL/I  standard  [2] .   The  algorithm 
is  executed  by  performing  the  operation  match ,  whose  inputs  are  a  D-list 
and  an  F-list,  and  whose  output  is  a  DF-list,  i.e.,  a  list  of  pairs.   The 
algorithm  proceeds  by  a  sequence  of  reductions.   Wien  both  the  D-list  and 
the  F-list  begin  with  a  single  item,  we  can  remove  those  items  from  the 
two  lists  and  construct  a  new  item  for  the  DF-list.   Moreover,  if  both 
the  D-list  and  the  F-list  start  with  a  repeated  sequence,  and  the  two 
sequences  both  have  the  same  repetition  factor  and  the  same  inner  cardinality, 
then  we  can  add  a  corresponding  repeated  sequence  to  the  DF-list,  applying 
match  recursively  to  obtain  the  inner  nonrepeated  sequence.   (It  Is  this 
recursion  that  enables  us  to  retain  most  of  the  iterative  structure  of 
the  original  lists.)   The  rest  of  the  algorithm  is  concerned  with  modifying 
the  D-list  and  the  F-list  so  as  to  get  them  into  a  form  in  which  the 
initial  components  can  be  paired  up  as  we  have  just  described. 

In  certain  cases,  when  variable  repetition  factors  are  encountered, 
the  correspondence  between  the  D-list  and  the  F-list  cannot  be  determined 
until  execution.   In  these  cases,  the  algorithm  rejects  the  inr)ut.   To 
see  that  variable  repetition  factors  can  cause  this  difficulty,  consider 
the  case : 

D-list:   [V^[D^],D2] 

F-list:   [F-j^/F^] 


10 
This  case  is  a  translation  of  the  example  given  in  the  Introduc<-ion;  the 
proper  pairing  of  D  depends  on  the  value  of  V,.   On  the  other  hand,  some 
cases  involving  variable  repetition  factors  can  be  treated.   For  instance, 
the  pair : 

D-list:   [V^[D^]] 

F-list:        [°°[F    ]] 
yields    the   DF-list    [V    [<D    ,F   >]]. 

match (ds,fs) 

where  ds  is  a  D-list  and  fs^  is  an  F-list 
Result:   a  DF-list 
Note:   fs  will  always  include  at  least  .is  many  items  as  ds. 
Step  1.   Let  dfs  be  an  empty  list. 
Step  2.   Perform  Step  2.1  repeatedly  until  d£  is  empty.   Then  return  dfs 

as  the  value  of  match. 
Step  2.1.   Let  cde  and  cfe  be,  respectively,  the  first  component  of  ds^ 

and  of  fs. 
Case  2.1.1.   cde  and  cfe  are  both  individual  items. 

Append  the  pair  <cde,cfe>  to  dfs.   Delete  cde  from  ds  and 
delete  cfe  from  f s. 
Example:   ds  =  [B^,2[T>^]] 
fs  =  [F^,-[F^]] 
new  pair  =  <1)  »F  > 
Case  2.1.2.   Either  cde  or  cfe  is  an  individual  item,  while  the  other  is 
a  repeated  sequence  with  a  constant  or  infinite  repetition 
factor.   ]f  cfe  is  the  individual  item,  perform  split  (1  ,ds)  to 
obtain  a  r ew  d£.   Otherwise  perform  split  (1 ,fs)  to  obtain  a 
new  fs. 
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Example:   ds  =  [D  ,2[D2]] 

fs  =  [6[F^],-[F2]] 
new  fs  =  [F^,5[F^],oo[F2]] 
Case  2.1.3.   Either  cde  or  cfe  is  an  individual  item,  while  the  other  has 
a  variable  repetition  factor. 
The  input  is  rejected. 
Example:   ds  =  [V  [D  ] ] 

Is  =  [F^,  -[F2]] 
Note  that  in  this  example,  V  may  or  may  not  be  greater  than  0. 
Case  2,1.4.   cde  and  cfe  are  both  repeated  sequences  with  the  same  inner 
cardinality. 

Let  rd  and  i^  be  the  repetition  factors  of  cde  and  cfe 
respectively. 
Case  2.1.4.1.   rd  and  r^  are  identical. 

Let  nsd  and  nsf  be  the  nonrepeated  sequences  in  cde  and 
cfe  respectively.   Perforin  match (nsd, nsf )  to  obtain  a  DF- 
list,  dfl.   If  rd^  is  one,  then  append  dfl  to  dfs ;  otherwise, 
append  rd[df 1]  to  dfs.   Delete  cde  and  c^  from  ds  and  fs 
respectively. 
Example:   ds  =  [5[D  ,D  ]] 

fs  =  [5[2[F^]],co[F2l] 

new  component  of  DF-list  =  5[<D  ,F  >,<D  ,F  >] 
Case  2.1.4.2.   rd  and  r;f_  are  different  constants,  or  rf  is  infinite. 
If  rd  <  r;f ,  perform  split(rd,fs)  to  obtain  a  new  f s . 
Otherwise,  perform  s))lit  (rf  ,ds)  to  obtain  a  new  ds . 
Example :   ds  =  [4[D  ],D  ] 
fs  =  [-[F^]] 
new  fs  =  [4[F^],co[F^]] 


12 

Case  2. J. A. 3.   rd  is  variable  and  £f  is  infinite. 

P< rform  split (rd,fs)  to  obtain  a  new  f s. 

Txample:   ds  =  l^\\-0-^  1  1 

_fs  =  [MFj]] 

new  fs  =  [V  [F  l,oo[F  ]] 
—     1   1      1 

Case  2.1. A. A.   (otherwise.) 

The  input  is  rejected. 

Example:   ds  =  [V  [D  J ] 

fs  =  [3[¥^],^[F^]] 

Case  2.1.5.   cde  and  cfe  are  both  repeated  sequences  with  different,  but 

constant,  inner  cardinalities,  nd  and  rrf  respectively.   Let  Icm 
be  the  least  common  multiple  of  nd  and  jrf ,  and  let  md  = 
Icm/nd,  mf  =  Icm/nf .*  Let  rd^  and  rf^  be  the  repetition  factors 
of  cdo  and  cfe  respectively.   Let  nr  =  min(rd/md ,rf /mf )  if 
neither  rd  nor  t_£_   is  variable,  and  let  nr  be  undefined  otherwise. 

(;ase  2.1.5.].   nr  is  defined  and  nr  >  1. 

Step  2.1.5.1.1.   If  rd^  >  nr*md,  perform  split  (nr'^'md  ,ds)  to  obtain  a  new  ds. 

If  ri_  >   nr*mf ,  perform  split  (nr*mf ,  fs)  to  obtain  a  new  fs. 

(nr  will  be  the  new  repetition  factor  for  the  first  component 

both  of  ds^  and  of  f  s .  ) 
Note:   It  is  possible  that  zero,  one  or  two  split  operations  will  be 
performed  in  this  step. 

Step  2.1.5.1.2.   If  Tnd_   >  1,  replace  the  first  component  of  d^  by  nr  [md  [s]  ]  , 
where  s  is  the  nonrepeated  sequence  of  cde. 


*We  use  "/"  to  indicate  integer  division  with  the  remainder  discarded. 
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Step  2.1.5.1.3.   If  mf  >  1,  replace  the  first  component  of  jfs  by  nr [mf [s | ] , 

where  s^  is  the  noni  epeated  sequence  of  cfe. 
Note:   On  the  next  step.  Case  2.1.4  1.  will  apply,  since  both  ds^  and  is^ 

will  start  with  a  component  liaving  repetition  factor  nr  and  inner 

cardinality  1cm. 

Example:   ds  =  [8[D  ,D  ]] 

fs  =  [oo[2[¥^],F^]] 

nd=2,  ilf=3,  1cm- 6 ,  md=3,  iaf=2,  rd=8,  rf=°°,  nr=2 
new  d£  =  [2[3[D^,D2]],2[D^ ,02]] 
new  fs  =  [2[2[2[F^],F2]],-[2[F^],F2]] 
Case  2.1.5.2.   nr  is  defined  and  nr  <  1. 

If  nd  >  rif,  perform  split  (l,ds)  ;  otherwise  perform  split  (1 ,  fs)  . 
Note:   In  this  case,  one  or  both  of  the  first  components  of  ds  and  fs  contains 
too  few  elements  to  allow  us  to  extract  a  common  repeated  part,  so 
we  expand  the  longer  one.   If  necessary,  the  shorter  one  will  be 
expanded  on  the  next  iteration. 
Example :   ds  =  [  i[D  ] ] 

fs  =  [-'[F^.AIF^]] 

nd=l,  iif^=5,  lcm=5 ,  md=5 ,  mf=l,  rd=3,  rf="°,  nr=0 
new  fs  =  [F^,4[F2],-[F^,4[F2]]] 
Case  2.1.5.3.   rd  is  variable,  rf^  is  infinite,  and  nd^  is  a  multiple  k  of  n^f. 
Let  s  be  the  nonrepeated  sequence  of  cfe.   Replace  cfe  by 
rd[k[s]],-[s]. 
Note:   Botli  ds  and  fs   now  start  with  a  component  with  reiietitlon  factor 

rd  and  inner  cardinality  nd. 
Example:   ds  =  [V  [D   D„]] 
fs  =  [oo[F^]] 

new  _r_s  =  [V^I2[Fjn,-|F^  II 
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Case  2.1.5.4.   rd  is  variable,  but  Case  2.1.5.3  does  not  apply. 

Reject  the  input. 
Example:   ds  =  [V  [D  ] ] 

fs  =  [-[F^,6[F2]]] 
Note:   Although  in  practice  it  may  be  possible  to  solve  this  case,  the 

solution  cannot  be  expressed  in  our  formalism.   The  solution  would 
be: 

[.o[<D^,F^>,6[<D^,F2>]]] 
with  an  auxiliary  test  needed  to  ensure  that  only  V  elements  are 
proc  essed. 
Case  2.1.6.   (Otherwise.) 

Reject  the  input. 

split(k,s) 

where  k  is  an  integer  and  s^  is  a  nonrepeated  sequence. 

Result :   a  modified  nonrepeated  sequence. 

Step  L.   Let  c_  be  the  first  component  of  s^.   c^  must  be  a  repeated  sequence, 

so  it  has  repetition  factor  _r  and  contains  a  nonrepeated  seauence 

cs. 
Step  2.   Let  k2  be  i^  -  k.   (Note  that  °°  minus  anything  is  °°.) 

Replace  c^  by  the  two  components   k[c^]  ,k2[£S^] . 
Step  3.   If  either  k  or  k2  is  1,  replace  the  corresponding  component  by  cs , 

i.e.,  delete  the  repetition  factor. 
An  example  of  the  algorithm  applied  to  a  compound  case  is  shown 
in  Figure  1.   Two  smaller  examples,  omitting  the  intermediate  steps,  are: 
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Examp 1 e  2 


data  list:  [3[5[D^ ,0^] ,D^,D^]  ] 

format  list:  [^[e[Y^,F^]]  ,->[e[F^,Y^]]] 

resulting  list:  [3[5 [<D^ ,F^> ,<D2,F2>] ,<D2 ,F^> ,<D^ ,F2>] ] 

Examile  3 

data  list:  [200[D^,5 [D^l ,5[D^] ] ] 

format  list:  [-[F^^  ,5  [F^]  ,5  [F^]  ]  ] 

resulting  list:  [200[<D^,F^> ,5<D2 ,F2> ,5<D^,F^>] ] 

6 .   Ireatment  of  Control  Formats 

The  formal  model  we  have  presented  does  not  account  for  control  formats, 
e.g.,  hollerith  fields  in  FORTRAN  formats  and  skips  to  the  next  record. 
However,  control  formats  are  easily  accounted  for  by  associating  them  with 
data  formats.   For  applications  other  than  optimization,  control  formats 
are  irrelevant  and  can  be  ignored. 

In  associating  control  formats  with  data  formats,  we  must  distinguish 
between  control  formats  that  are  executed  only  if  a  following  data  format 
is  used,  and  those  that  are  executed  whenever  the  preceding  data  format  is 
used.   Tn  FORTRAN,  the  rule  is  that  following  control  formats  are 
executed  unless  the  end  of  the  entire  format  is  encountered.   Thus,  if 
we  execute: 

WRITE(u,10)A,B 
10    FORMAT (IHl,  15,  IH*,  15,  IH*) 
both  stars  are  printed.   Hence  the  first  15  has  two  control  formats  (IHl  and 
IH*)  associated  with  it,  while  the  second  15  has  the  second  IH*  associated 
with  it. 
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In  FORTRAN,  we  must  also  account  for  the  peculiar  behavior  of  end-of- 
line.   An  end-of-line  is  generated  whenever  the  right  end  of  the  format  is 
encountered.   Hence  a  line  skip  must  be  associated  as  a  post-format  for  the 
last  data  format  in  the  list.   A  line  skip  also  occurs  at  the  completion  of 
the  entire  operation,  unless  one  has  just  been  produced;  this  final  skip 
can  be  generated  independently  of  our  algorithm. 

In  PL/I,  control  formats  are  not  used  unless  tie  following  data  format 
is  also  used.   Hence  in  PL/I,  all  control  formats  are  associated  as  pre- 
formats  with  data  formats. 
7.   Actual  Experience 

To  demonstrate  the  effectiveness  of  the  algorithm  in  determining  data- 
format  correspondence,  a  group  of  programs  were  analyzed.   Thirteen  programs 
were  chosen,  all  written  in  FORTRAN.   The  programs  were  selected  randomly; 
listings  were  obtained  from  the  graduate  students  a mailable  one  Satjrdav 
afternoon.   Some  of  the  programs  were  large  and  had  been  coded  by  numerous 
people.   In  all,  the  programs  were  the  work  of  about  25  nroarammers. 

Two  hundred  and  fifty-one  data  and  format  statements  were  examined  and 
only  fifteen  could  not  be  completely  analyzed  by  the  algorithm.  Thus,  this 
technique  failed  in  only  six  percent  of  the  cases  examined. 

A  few  observations  about  the  formats  are  also  of  interest.   About  2  5 
percent  of  the  data  lists  were  empty  and,  thus,  the  format  list  had  only 
foinmat  control  information.   About  40  percent  of  the  data  and  format  lists 
examined  could  be  analyzed  completely  by  using  just  Case  1  of  the  algorithm. 
None  of  the  examined  lists  were  as  complex  as  those  presented  in  th(;  previous 
examples;  none  required  the  use  of  Case  2.1.5.   Though  PL/I  programs  were 
not  analyzed,  we  have  no  reason  to  believe  the  results  would  be  substantially 
different. 
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